/********

Project - College affirmative action bans and health risk behaviors

Dataset - Tobacco Use Supplements of Current Population Survey (CPS)

Version - May 29, 2019 (for replication archive)

Description - This file contains the code used to generate all of the main and supplemental analyses using the TUS-CPS reported
in the published paper. (The datafile used for analysis was created as outlined in "datasetup_TUSCPS.do").
We have heavily annotated this file to transparently guide the reader on specific choices made in the analysis.

To run this analysis, the user will need to install the following stata routines:
-outreg2-
-reghdfe-

We use -reghdfe- here instead of -areg- given the slightly faster computation time. In the YRBS data we use -areg- as it is faster.
This difference may be due to the larger sample size and additional fixed effects in the CPS-TUS models. Regardless, the user
can verify that the results are unchanged regardless which is used (in the YRBS, the standard errors are even a bit smaller).

********/

***SET FILE PATHS AND CALL DATASET

global data "[YOUR FILEPATH]"  /*calls in all data used for the project */
global outreg "[YOUR FILEPATH]"  /*file to house data output*/

use "$data/TUSCPS_analysis_datafile.dta", clear

***********************************************************
******CREATE AFFIRMATIVE ACTION BAN EXPOSURE MEASURES
***********************************************************

/*
The idea here is to capture the period when adult respondents were in their junior or senior years high school, and assign 
exposure to affirmative action bans on this basis. The TUS-CPS data sample includes adults ages 19-40 (an age range we chose to
encapsulate individuals who were most likely no longer high school students and those who entering and exiting high school during the
period affirmative action bans were implemented). We use data on survey year and self-reported age to estimate the birth year,
to which we add 16 to capture individuals who were 16-17 years of age - i.e., juniors and seniors in high school - at the time of affirmative action 
ban implementation. We assign ban exposure on the basis of whether the individual was 16-17 years of age or younger at the time a ban 
was implemented in their current state of residence. The use of current state of residence to capture the state where an individual attended
high school is necessary because the CPS does not include retrospective measures of migration. While there is likely measurement error, 
recent work has shown that this procedure does not bias estimates of long-run effects of exposures occuring in adolescence and young 
adulthood (e.g., https://www.nber.org/papers/w25141). 
*/

*First, generate the year the individual was 16-17 years old (=year)

rename year cps_year /*this is the year the TUS-CPS wave was conducted*/
gen year = (cps_year - age) + 16 /*adding 16 ensures individual is atleast 16; but individuals could also be 17 depending on when in year TUS-CPS was conducted; further accounting for this makes no difference */


*Now, create the affirmative action ban exposure variables using the same procedure as in YRBS_estimates_May1_2019

**Main binary variable

gen aa_jel2016=.
replace aa_jel2016=0 
replace aa_jel2016=1 if stfips==48 & year>=1997
replace aa_jel2016=0 if stfips==12 & year<2001
replace aa_jel2016=1 if stfips==12 & year>=2001
replace aa_jel2016=0 if stfips==06 & year<1998
replace aa_jel2016=1 if stfips==06 & year>=1998
replace aa_jel2016=0 if stfips==53 & year<1999
replace aa_jel2016=1 if stfips==53 & year>=1999
replace aa_jel2016=0 if stfips==26 & year<2006
replace aa_jel2016=1 if stfips==26 & year>=2006
replace aa_jel2016=0 if stfips==04 & year<2010
replace aa_jel2016=1 if stfips==04 & year>=2010
replace aa_jel2016=0 if stfips==33 & year<2012
replace aa_jel2016=1 if stfips==33 & year>=2012
replace aa_jel2016=0 if stfips==40 & year<2013
replace aa_jel2016=1 if stfips==40 & year>=2013
replace aa_jel2016=0 if stfips==31 & year<2008
replace aa_jel2016=1 if stfips==31 & year>=2008
sort year

*(Note - Texas had a favorable court ruling in 2003 that allowed for reinstitution affirmative action programs.
*However, many colleges still contonued without affirmative action programs and so we considered bans to be active for all years
*after 1997. In a specification check below, we used the below recoded measure where individuals surveyed after 2003 were considered 
*to be unexposed, finding similar results)

gen aa_alt = aa_jel
replace aa_alt = 0 if statefip==48&year>=2003

**Event study variable

gen year_ban=.
replace year_ban=1997 if stfips==48
replace year_ban=2001 if stfips==12
replace year_ban=1998 if stfips==06
replace year_ban=1999 if stfips==53
replace year_ban=2006 if stfips==26
replace year_ban=2008 if stfips==31
replace year_ban=2010 if stfips==04
replace year_ban=2012 if stfips==33
replace year_ban=2013 if stfips==40

tab year_ban, missing

gen aa_event=.
replace aa_event=year-year_ban
replace aa_event=. if year_ban==.
tab aa_event, missing

*Bin in two year increments to match YRBS analysis and since TUS-CPS is not fielded annually
*Bin event years before and including -7 and after +6 to ensure relative balance (does not change main estimates)

gen aa2 = aa_event
recode aa2 (min/-7 = .) (7/max = .) 
recode aa2 (-6/-5 = -5.5) (-4/-3 = -3.5) (-2/-1 = -1.5) (0/1 = 0.5) (2/3 = 2.5) (4/5 = 4.5) (6/7 = 6.5)
tab aa_event, gen(aa_)

mark banst if year_ban~=.
tab year banst
tab year banst if race_min==2

mark event_time1 if aa_event<=-7
mark event_time2 if (aa_event>=-6&aa_event<=-5)
mark event_time3 if (aa_event>=-4&aa_event<=-3)
mark event_time4 if ( aa_event>=-2&aa_event<=-1)
mark event_time5 if (aa_event>=0&aa_event<=1)
mark event_time6 if aa_event>=2&aa_event<=3
mark event_time7 if aa_event>=4&aa_event<=5
mark event_time8 if aa_event>=6

*Identify states bordering those with affirmative action bans. This is for a specification check (below) where we allow for cross-border 
*spillovers; i.e., that students in Nevada, for example, may be affected by bans in California given their high application rates
*to University of California and California state institutions.

gen border=.
replace border=0
replace border=1 if stfips==35 & year>=1997
replace border=1 if stfips==40 & year>=1997
replace border=1 if stfips==05 & year>=1997
replace border=1 if stfips==22 & year>=1997
replace border=1 if stfips==01 & year>=2001
replace border=1 if stfips==13 & year>=2001
replace border=1 if stfips==41 & year>=1998
replace border=1 if stfips==32 & year>=1998
replace border=1 if stfips==04 & year>=1998
replace border=1 if stfips==41 & year>=1999
replace border=1 if stfips==16 & year>=1999
replace border=1 if stfips==01 & year>=2002
replace border=1 if stfips==12 & year>=2002
replace border=1 if stfips==47 & year>=2002
replace border=1 if stfips==45 & year>=2002
replace border=1 if stfips==37 & year>=2002
replace border=1 if stfips==55 & year>=2006
replace border=1 if stfips==39 & year>=2006
replace border=1 if stfips==18 & year>=2006
replace border=1 if stfips==06 & year>=2010
replace border=1 if stfips==08 & year>=2010
replace border=1 if stfips==32 & year>=2010
replace border=1 if stfips==35 & year>=2010
replace border=1 if stfips==49 & year>=2010
replace border=1 if stfips==50 & year>=2012
replace border=1 if stfips==23 & year>=2012
replace border=1 if stfips==25 & year>=2012
replace border=1 if stfips==48 & year>=2013
replace border=1 if stfips==35 & year>=2013
replace border=1 if stfips==08 & year>=2013
replace border=1 if stfips==20 & year>=2013
replace border=1 if stfips==29 & year>=2013
replace border=1 if stfips==05 & year>=2013
replace border=1 if stfips==08 & year>=2008
replace border=1 if stfips==20 & year>=2008
replace border=1 if stfips==56 & year>=2008
replace border=1 if stfips==46 & year>=2008
replace border=1 if stfips==19 & year>=2008
replace border=1 if stfips==29 & year>=2008

*Exclude from estimation sample states that considered, but did not pass, affirmative action bans over a long period of time.
*Specifically, these are states that had multi-year litigation around affirmative action. 
*We note that Colorado considered an affirmative action ban as part of a voter initiative in 2008 that was narrowly defeated. We
*do not exclude Colorado though given that the ban was only considered over a short time frame (<1 year). (The reader can verify that
*excluding Colorado does not change the results at all)
 
gen est_sample = 1
recode est_sample (1=.) if stfip==1|stfip==13|stfip==22|stfip==28

*DEFINE SAMPLE AND GLOBALS
/*
We include all individuals who were age 16 or above as of 1990 to match with the YRBS. We chose 1990 (over 1991) 
since 16 year old individuals that year would have likely still have been in high school in 1991 (i.e., the YRBS 1991 wave includes individuals who
were both juniors and seniors, the latter of whom would have been ~16 years old the year before).
*/
global sample_est "est_samp==1&age>=19&age<=30&year>=1990" 
global policies "ln_real_pci ln_cigtax ln_beertax unemp"

******************************
*****SUMMARY STATISTICS ******
******************************

**TABLE 1 (SECOND TWO COLUMNS) - SAMPLE DESCRIPTIVES

log using "$outreg/Table_1_samplestats_TUSCPS"

quietly: xi: reghdfe curr_smoke aa_jel if race_min==2&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
mark sample if e(sample)

bysort banst: sum curr_smoke age sex hs coll if sample==1&race_min==2 [aw = wtfinl]

bysort banst: tab sex  if sample==1&race_min==2 
bysort banst: tab sex  if sample==1&race_min==2 [aw = wtfinl]

bysort banst: tab race_grp if sample==1&race_min==2 
bysort banst: tab race_grp if sample==1&race_min==2 [aw = wtfinl]

log close

**********************************
******MAIN DD REGRESSION ANALYSES*
**********************************

**TABLE 2 (LAST COLUMN) 

/*

Note - we maintain the same specification as the YRBS analysis, with one main difference, which is that we allow for fixed effects 
for survey year. 

*/

forvalues x = 1/2 {
	
	xi: reghdfe curr_smoke aa_jel if race_min==`x'&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun", excel stats(coef ci pval se) 

	}
	
*************************************************************************
******ADDITIONAL ANALYSIS AND SPECIFICATION CHECKS FOR MAIN DD MODEL*****
*************************************************************************

** S4 TABLE - RACE/ETHNICITY AND GENDER SPECIFIC REGRESSIONS

*Race specific: Numerical race categories denoted in race_grp correspond exactly to YRBS-race categories

forvalues x = 1/3 {
	
	xi: reghdfe curr_smoke aa_jel if race_grp==`x'&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_RACE_Longrun", excel stats(coef ci pval se) 

	}

*Gender specific

forvalues x = 1/2 {

	xi: reghdfe curr_smoke aa_jel if sex==`x'&$sample_est&race_min==2, abs(statefip year statefip#c.year cps_year age sex year#race_grp v)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_GENDER_Longrun", excel stats(coef ci pval se) 

	}
	
**S5 TABLE - SPECIFICATION CHECKS

** ROBUSTNESS (MAIN RESULTS)
	
	*Add in state-year covariates
	xi: reghdfe curr_smoke aa_jel $policies if race_min==2&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp  month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_URM", excel stats(coef ci pval se) 
	
	*Allow for separate effects in border states
	xi: reghdfe curr_smoke aa_jel border if race_min==2&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp month )  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_URM", excel stats(coef ci pval se) 

	*Use recoded variable which specifies no exposure in Texas after 2003
	xi: reghdfe curr_smoke aa_alt if race_min==2&$sample_est&(statefip~=48&statefip~=26), abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_URM", excel stats(coef ci pval se) 

	*Restrict to only those states that implemented bans
	xi: reghdfe curr_smoke aa_jel if race_min==2&$sample_est&banst==1, abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_URM", excel stats(coef ci pval se) 
	

	/* This is not in the main paper, but here are the same checks for the non-Hispanic White sample 

	*Add in state-year covariates
	xi: reghdfe curr_smoke aa_jel $policies if race_min==1&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp  month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_White", excel stats(coef ci pval se) 
	
	*Allow for separate effects in border states
	xi: reghdfe curr_smoke aa_jel border if race_min==1&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp month )  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_White", excel stats(coef ci pval se) 

	*Use recoded variable which specifies no exposure in Texas after 2003
	xi: reghdfe curr_smoke aa_alt if race_min==1&$sample_est&(statefip~=48&statefip~=26), abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_White", excel stats(coef ci pval se) 

	*Restrict to only those states that implemented bans
	xi: reghdfe curr_smoke aa_jel if race_min==1&$sample_est&banst==1, abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_White", excel stats(coef ci pval se) 
	
	*/
	
***S8 TABLE - Including Weights
/*
Note - We follow Solon et al (2015) and Winship and Radbill (1994) and do NOT include weights in the main
model. This is for several reasons:

- Per YRBS website, weights are based on sex, grade, race and location - we control for all of these in our model in a flexible manner
- In this case, OLS is BLUE. No e/o of selective sampling on the dependent variable (which we also test for above)
- Weighting to correct for heteroskedasticity is dominated by use of robust standard errors when both OLS and WLS are consistent
- Weighting can reduce precision in situations where individual errors are clustered within states
- The results of the weighted analyses bear this out
*/

	xi: reghdfe curr_smoke aa_jel if race_min==2&$sample_est [aw = wtfinl], abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_Longrun_ROBUST_Sept252018", excel stats(coef ci pval se) 

	
*********************************************************
******EVENT STUDY ESTIMATES - FIGURE 2 and S2 FIGURE*****
*********************************************************

*Generate event study estimates for URM, collect coef
xi: reghdfe curr_smoke event_time1-event_time3 event_time5-event_time8 if race_min==2&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp month)  cluster(statefip)
		gen y_u_coef4 = .
		gen y_u_sd4 = .

foreach  i of numlist 1/3 5/8 {
		gen y_u_coef`i' = _b[event_time`i']
		gen y_u_sd`i' = _se[event_time`i']
	}

*Generate event study estimates for Non-Hispanic White, collect coef
xi: reghdfe curr_smoke event_time1-event_time3 event_time5-event_time8 if race_min==1&$sample_est, abs(statefip year statefip#c.year  cps_year age sex year#race_grp month)  cluster(statefip)
		gen y_w_coef4 = .
		gen y_w_sd4 = .

		foreach  i of numlist 1/3 5/8 {
		gen y_w_coef`i' = _b[event_time`i']
		gen y_w_sd`i' = _se[event_time`i']
	}


*Collect coeficients and s.e.
keep y_u* y_w*
collapse (mean) *
gen x=1
reshape long y_u_coef y_u_sd y_w_coef y_w_sd , i(x) j(year)
drop x	

save "$outreg/temp_cps.dta", replace
	
use "$outreg/temp_cps.dta", clear

*Generate 95% CIs
gen upper_u = y_u_coef + 1.96*y_u_sd
gen lower_u = y_u_coef - 1.96*y_u_sd	
drop y_u_sd

gen upper_w = y_w_coef + 1.96*y_w_sd
gen lower_w = y_w_coef - 1.96*y_w_sd	
drop y_w_sd


recode y_* upper* lower* (.=0) if year==4

label define event_t 1 "-7+" 2 "-6/-5" 3 "-4/-3" 4 "-2/-1" 5 "0/1" 6 "2/3" 7 "4/5" 8 "6+"
label values year event_t

*Figure 2
twoway (rline upper_u lower_u year if year<4, lcolor(navy) lpattern(dot)) ///
	(rline upper_u lower_u year if year>=5, lcolor(navy) lpattern(dot)) ///	
    (scatter y_u_coef year if year<=4, msymbol(diamond) msize(medlarge) recast(connected) lstyle(solid) lcolor(navy) mcolor(navy)) ///
	(scatter y_u_coef year if year>=5, msymbol(diamond) msize(medlarge) recast(connected) lstyle(solid) lcolor(navy) mcolor(navy)) ///	
	, legend(off) xtitle("Years Relative to Ban Implementation") xlabel(,labsize(medsmall)) ytitle("Coef Estimate") ylabel(,labsize(medsmall)) xline(4.5, lpattern(dash) lcolor(gs4)) yline(0, lpattern(dash) lcolor(gs4)) xlabel(1(1)8, valuelabel) ylabel(-0.1(0.05)0.15) graphregion(color(white)) bgcolor(white)

	graph save "$outreg/urm_cps_smoke", replace
	graph export "$outreg/urm_cps_smoke.tif", replace width(2550)


*S2 Figure
twoway (rline upper_w lower_w year if year<4, lcolor(maroon) lpattern(dot)) ///
	(rline upper_w lower_w year if year>=5, lcolor(maroon) lpattern(dot)) ///	
    (scatter y_w_coef year if year<=4, msymbol(square) msize(medlarge) recast(connected) lstyle(solid) lcolor(maroon) mcolor(maroon)) ///
	(scatter y_w_coef year if year>=5, msymbol(square) msize(medlarge) recast(connected) lstyle(solid) lcolor(maroon) mcolor(maroon)) ///	
	, legend(off) xtitle("Years Relative to Ban Implementation") xlabel(,labsize(medsmall)) ytitle("Coef Estimate") ylabel(,labsize(medsmall)) xline(4.5, lpattern(dash) lcolor(gs4)) yline(0, lpattern(dash) lcolor(gs4)) xlabel(1(1)8, valuelabel) ylabel(-0.1(0.05)0.15) graphregion(color(white)) bgcolor(white)

	graph save "$outreg/white_cps_smoke", replace
	graph export "$outreg/white_cps_smoke.tif", replace width(2550)

*Erase temporary data
erase "$outreg/temp_cps.dta"
		
*********************************************************
******FALSIFICATION EXERCISE - S6 TABLE******************
*********************************************************

clear

*RE-OPEN MAIN DATASET

use "$data/TUSCPS_datafile.dta", clear


/*DEFINE FALSIFICATION EXPOSURE:

Specifically, we now allow individuals to be exposed at age 19, which is most likely after high school completion and therefore 
occuring at a period when college affirmative action bans should be less salient.

*/

rename year cps_year /*this is the year the TUS-CPS wave was conducted*/
gen year = (cps_year - age) + 19 

*Generate AA ban variable based on this exposure

gen aa_jel2016_curr=.
replace aa_jel2016_curr=0 
replace aa_jel2016_curr=1 if stfips==48 & year>=1997
replace aa_jel2016_curr=0 if stfips==12 & year<2001
replace aa_jel2016_curr=1 if stfips==12 & year>=2001
replace aa_jel2016_curr=0 if stfips==06 & year<1998
replace aa_jel2016_curr=1 if stfips==06 & year>=1998
replace aa_jel2016_curr=0 if stfips==53 & year<1999
replace aa_jel2016_curr=1 if stfips==53 & year>=1999
replace aa_jel2016_curr=0 if stfips==26 & year<2006
replace aa_jel2016_curr=1 if stfips==26 & year>=2006
replace aa_jel2016_curr=0 if stfips==04 & year<2010
replace aa_jel2016_curr=1 if stfips==04 & year>=2010
replace aa_jel2016_curr=0 if stfips==33 & year<2012
replace aa_jel2016_curr=1 if stfips==33 & year>=2012
replace aa_jel2016_curr=0 if stfips==40 & year<2013
replace aa_jel2016_curr=1 if stfips==40 & year>=2013
replace aa_jel2016_curr=0 if stfips==31 & year<2008
replace aa_jel2016_curr=1 if stfips==31 & year>=2008
sort year

*Mark estimation sample

gen est_sample = 1
recode est_sample (1=.) if stfip==1|stfip==13|stfip==22|stfip==28
global sample_est "est_samp==1&age>=19&age<=30&year>=1990"


*Run Model
forvalues x = 1/2 {

	xi: reghdfe curr_smoke aa_jel if race_min==`x'&$sample_est, abs(statefip year statefip#c.year cps_year age sex year#race_grp )  cluster(statefip)
	outreg2 using "$outreg/aa_ban_CPSTUS_DD_FALSE_Longrun", excel stats(coef ci pval se) 

	}
	
